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ABSTRACT 

Four statements formulated after an extensive 
literature review on student ratings of instruction are proposed: (1) 
we lust remember the basic definition of validity, (2) we must 
clarify what it is that a particular teacher is trying to do or 
proposes to do in a given classrooa, (3) we must be clear about 
defining what we want to obtain from student ratings, and (*) we aust 
sake a greater effort to measure student performance as a result of 
or in spite of what the teacher intended to do and what actually was 
done. A Taxonomy of kinds of validity is presented. The 
identification of validity for a particular purpose, situation, and 
group is discussed. The need for a more sophisticated analysis of 
studeAt rating data, Utilizing more recent statistical tools such as 
discriminant analysis' and multivariate procedures, is pointed * out • 
{Author/BJG) 
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The title of my paper is: "Validity of Student Ratings of 
Instruction: Validity for What Purpose and What Kind of 
Validity?" *$f£ presentation amounts to four statements that 
I have formulated after an extensive review of literature on 
Student "Ratings to be" published in the" hear future by Clarke 
Irwin of Toronto. 



STATEMENT I. 



WE HAVE FORG0TTEN~OR~TGNORED-THE 
BASIC DEFINITION OF VALIDITY - 
LET'S REMEMBER IT. 



STATEMENT II. \* f 

WE OFTEN. FORGET OR DO NOT KNOW OR f 
IGNORE ^wlyVT IT IS TfcAT, A PARTICULAR 
T EACH ER IS TRYING TO DO OR PROPOSES 



TO DO IN A GIVEN CLASSROOM. (Course 
Syllabi and Course Strategies) . 



LET'S CLARIFY THAT. 



STATEMENT III. 



WE HAVE TO DEFINE BETTER WHAT WE 
WANT TO OBTAIN PROM STUDENT RATINGS. 
(Reported eve.nts, attitudes, agree- 
ments, judgments, opinions or reactions 
or what?)' * , 

LET'S BE CLEAR ABOUT THAT . 

STATEMENT IV . " . « 

' WE NEED TO MAKE A GREATER EFFORT TO 

MEASURE STUDENT PERFORMANCE AS A 
RESULT OF OR IN SPITE OF WHAT THF^ 
TEACHER OR PROFESSOR INTENDED TO^DO 
AND WHAT ACTUALLY WAS : DONE. LET'S 
GIVE OUR BEST EFFORTS TO THAT. 



NOW Mr. Chairman WITH these four GIVEN' S I think I should sit 
down and open the floor for questions and discussion, * 



1 

BUT maybe it would be beneficial to provide some explanations. 
■ SO back to Statement I. WHAT IS VALIDITY? 

In my frame of reference "Validity is ,the abiii-ty of an instru- 
ment to measure what it was designed to measure. rt Now we know, 
don't we, ,that there is no. such thing as a valid instrument . 
Usually we ^ly that the results of a particular instrument are 
valid for a particular purpose , in a particular situation and 
with euparticutar group . 



* 9 

Here is a TAXONOMY OF KINDS OF VALIDITY . 



TNSERT FIGURE I 



3. 

Now there are 24 definitions or kinds of validity which can 
be subsumed under four categories which themselves belong to 
either of two major types of validity. I ask you/ wha^fe- kinds of 

\ 

Validity are we dealing with when we examine the host of studies 
* done about and around student ratings. I propose to you 
that a study be done to classify reported and "ttnreported 
studies according to these twenty-four kinds as a start .to 
comprehend further the state of validity of student ratings. 
It is no easy tai,k and I will give you some examples later. 
Which kinds of validity do people have in mind when construct- 
ing their questionnaires or rating forms? v 

VALID FOR A PARTICULAR PURPOSE : / 

Aleamoni et al. (1973) have shown that results of student 
ratings are different when instruction to students indicates 
that the purpose is for course improvement bn the one hand 
and for P & T decisions on the other hand. What does this 
tell us about results of student ratings when the" purpose 
has not been specified or is left vague or is otherwise 
ambiguous to the student? A similar study is now being done, 
at McGill (Levy, 1975? Pascal, Nadeau, Shore) with four 
sets of instructions. We are anxious to see the results. 
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VALID FOR A PARTICULAR SITUATION : 

What does this tell us about the validity of data on student 
ratings when one uses, uncritically, entire, instruments 
developed elsewhere for purposes sometimes unknown? I 
believe ihis— tells us that an instrument must have content 
validity and "Content Agreement" before it can be used with 
a particular prof essoi/ and classroom, with ' the hope that the 

results ^re going to be valid. 

\ = " 

I believe it alsp tells us that before an instrument is - 
developed for \*sa in a particular classroom for a particular 
group with a particular instructor, one needs to have a 
clear definition of the situation in which the instrument 
is going to be applied-, namely, what is the classroom 



organization and management situation? (Is it a lecture, 

discussion, seminar, independent study, projects, group % 

* ^ * f 

work* the list and ^combinations are endless) . Wh at are t ^he 

— " * r ■ - ' ■ " "-~ J j 

teaching and learning activities and strategies? I: believe^ 

i . / 

we need to give answers to these before we can^examine ifj 
results of ratings are valid • * j 



VALID FOR \ PARTICULAR CROUP : 

Nqw,\surely, there should be little quarrel about that. 
Recent work by Doyle et al. (1973*) and others regarding 
student types is getting us to meet that particular 



requirement of validity. Also it should be clear that we 



need to understand and measure, in increasingly better ways, 

phd* performances of students and student groups as they try 

to achieve specified objectives with particular strategies 

\ f' 
under spepified constraints. To my" mind the studies of 

relationships of student .ratings and student performances 

are gravely lacking in those respects.' What about the 

non-measurement or non-assessment of entry behaviours of 

students in courses? What does this do to the ratings and 

the po^- achievement of students? 

When I look at the available validity studies of student 
ratings, I believe that answers to the above questions will 
bring us closer to identifying specific teacher skills that 
need to be developed as part of a teacher's repertoire of 
teaching behaviours that will be related to specific per- 
formances of learners. ' 

And finally, I also believe that we need to get a little', 
more sophisticated in our analysis of student ratings data 
by making use of some of our more recent statistical tools 
such as discrimiriant analysis, multivariate procedures in 
order to get at some aspects of the validity problem. In 
this context longitudinal studies are 6f prime importance. 



Table I. gives you some of the studies and the^r pre- 
occupations. 



INSERT TABLE I 



Investigations of validity of student ratings fall generally 
in three categories: 

a) Rating" form consent validity. 

b) Correlates of student ratings. 

c) Comparisons of student v *ratinga with- 

ratings of other raters; ? 

9 

Table I gives you a start iri your additional search for 
understanding of the validity ^question in the field of 

student ratings off instruction. \ I micjht add that out of 

i * - — — 

the 123 validity studies listed 46 are dated in the 1972-74 
period, 40 are in the 1968-71 period with the remaining 
37 studies before 1968. Most of the validity studies are 
therefore relatively recent indeed. 



\ 



\ 
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FIGURE I 
KINDS OP VALIDITY 



8. 



Validity: ] the state, status, of fact of being valid, sound, qual'ty being grounded on truth or fact, truthful, in measurement, 

\ the extent to which a test or other measuring device dees what it is supposed to do 

- 

Types of Evidinct for Validity. investigated -by analysts of test content or by a stuJy 01 u-Uttonships between test scores and criterion variable*, 

, independence of methods being a common denominator among the major typ2S of validity excepting content 
validity 



1. Concurrent: validity based upon correlation with a criterion 
variable that is measured at about the same time as the test 
administered * - 

2. Congruent: evidence pf validity obtained by correlating ? test 
witn an existing similar measure of the same function] (e.^-. 
correlation of a new .intelligence test with an existing intel- 
ligence test). 0 

3. Convergent: type of validity which requires a high correlation 
between a test and other variables which logically are related 
to the test, confirmation by independent measurement proce- 
dures * 

4. Factorial: a form of content validity, uses factor analysis to 
determine to what extent a test measures certain content 
areas, partitions true score variance into subcomponents which 
indicate the extent to which each factor is a subcomponent. 

5. Item: discriminative value, ol an item; correlation between an 
item and some criterion of performance. 

6. Statistical: evidence of test validity expressed numerically, 
usually -as correlabon between scores on the test and another 
set of measures such as scores on another test, teachers' 
marks* ratings by experts, etc 

7. Validity Evidence: information gathered to determine exactly 
what kind of inferences can be made from test scores. 

8. Validity Generalization: process in which additional informa- 
tion is obtained by checking the effectiveness of the test on a 
differently defined population but using the same criterion as 
m the original study. 

9. Criterion-related: validity demonstrated by comparing 'test 
scores with one or more external variables considered to pro- 
vide a direct measure of the characteristic or behavior in 
question, correlation between test score and criterion mea- 
sure; test user wishes^to forecast an individual's present or 
future standing on some variable of particular significance 
that is different from the test 

10. Differential: validity which depends on difference between 
correlation of classification test (ideal test) with each of sep- 
arate criteria to be predicted, with a two-criterion classifica- 
tion problem the ideal test would have a high correlation with 
one criterion and a zero or negative correlation with the 

. other criterion. 

11. Empirical: quality of test having definite and proved value for 
a given purpose; usually stated in terms of correlation; extent 
to which scores on a test agree with some outside criterion or 
future measure of success. 



12. incremental: amount the test will add to validity of predic- - 
tions made on basis of data usually available, validity stated 
in terms of some increment in productive efficiency over* in- 
formation otherwise easily and cheaply available. 

13. Intrinsic: validity evidence based 00 fact that items in a test ° 
are selected to stimulate the criterion item that the test is 
used to predict, n 1. 

14. Practical: validity of a tesl as determined by its ability to 
predict within a certain spherehiLbeffavior. 

15. Predictive; (ofa, test). validity based upon correlation with a 
criterion variable., that is not available until some time after 

. testing (e.g., school grades) 

16. Synthetic: validity for which each predictor is validated, not 
against a composite criterion but against job elements identi- 
fied through job analysis; the validity of any test for a given 
job is then computed synthetically from the weights of these 
elements in the job and in the test. 

47. Validity Extension: process by which test validity is checked 
aga*nst a new criterion as well as with a different population. 

18. -Co" Pt: attempt to analyze the validity of broad concepts 
in subject areas. % * 

19. Construct: validity evaluated by investigating what quafttio; a 
test measures determining degree to which certain explana- 
tory concepts or constructs account for performance on the 
test. * 

20. Content: validity demonstrated by showing how well the con- 
tent of the test samples the subject matter about which con 
elusions are to be drawn, test user wishes to determine how 
individual performs at present in. a universe of situations that 
test siiuation is claimed to represent. 

21. Curricular: evidence of test validity indicated by agreement 
between test content and ourrtcular content and test objac^ 
tives and curricular objectives. 

22. Face: validity referring to what a test appears to measure on 
basts of subjective evaluation, not what it actually measures; 

' * least justifiable of ail evidences of validity. 

23. Logical: estimate of content validity based on comparison of 
behavior demanded by the test with tne -behavior that, by a 
prior analysis, belongs to the variable to be measured. 

24. Operational; ability of a tesi or measuring instrument to 4d 
sonte task, defined In terms of operations it actually pe- 
forms (e,g., a yardstick is operationally valid ior linear rnKt* 
surement). 



Taken from: 



CEDR Quaterly, Phi Delta Kappa 
Spring, 1974. 
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FIGURE I CONT'D... 



9. 



TREE DIAGRAM DELINEATING TYPES OF EVIDENCE FOR VALIDITY 
INTO SUB-MAJOR AND. MAJOR CATEGORIES 



Specific Type 



Sub-Major Type 



Major type 




CONCURRENT 
CONGRUENT 
CONVERGENT - — 
FACTORIAL' 
ITEM 

STATISTICAL 
VALIDITY EVIDENCE 
' VALIDITY GENERALIZATION 

CRITERION-RELATED 1 < 
DIFFERENTIAL 
EMPIRICAL 
INCREMENTAL 
INTRINSIC 
PRACTICAL 
PREDICTIVE 
SYNTHETIC 
VALIDITY EXTENSION 



CURRENT 




PREDICTIVE 




OBJECTIVE 



CONCEPT 
CONSTRUCT 

CONTENT 
CURRICULAR 
FACE 



LOGICAL 
OPERATIONAL 



CONCEPT 




CONTENT 




V 



■JSUBJECT^VE 



•Though defined in the Dictionary as a form of Content validity, it utilized quantitative techniques *or justification of evidence. 
'Appropriate to both Current and Predictive one ma/ wi3h to determine current standing or forecast future position 

11 
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TABLE I.- Some Validity Studies in the Student Ratings* Literature.-, 



AUTHOR (9) 



DATE 



AREA 



Hal Stead 

Kent 

Holmes* 



(1970) 
(1967)* 
'-•/V (1971) 



Davis., Hildetgrand & ,Wirlso\ 



Despande e°t ,al : ft 
Crawford & Bradshaw 
■ Cof fraan 
Warrington 0 
Cos tin 5 
French' ^ 
Tuickmann* 
Gaazella 
Hoyt 

Musella and'.Rush 
GagnS & Chabot 
Perry 

Perry & Baumann 
Downie 
Aleamoni 
"'Mann . 
Langen 



(1971) 

\ (If 70) 
(1968) 
(1954) 
(1973) 
(1968) 
(1957) 
(1973) 
(1968) 

. (1973), 
(1968) 

- (1970) 
(1969) 

\ (1973) 
(1952) 
(1973) 
K1969) 
(1966) 



\ 



A. Content Validity 

(contents agreement, 
factor structure) 



1. Adapted from'Nadeau, G. G. "Student Evaluation of instruction: 
The rating questionnaire" i in Chris Knapper', George Gdis, Charles 
Pascal and Bruce Shore (eds) , Scaring the Ivory Towen; Appraising 
.' College & University Teaching . Clarke Irwin, Toronto, 1975. ~ - 
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TABLE I. - CONT'D 



11. 



AUTHOR (S) 



DATE 



AREA 



/ 



Kirchner 
Sockloff 
Hoyt 
Royce 

Widlak, McDaniel, Feltffiusen 
Weayier „ < <> 

Remmers - 
^ Sharon 
Guthrie 
Centra 
Aleamoni 



(1969) 

(1973) 

(1969) 

(1956) 

(1973) 

(1960) 

(W59) x 

(1970) \ 

d954) \ 

(1973) c 

U972-73) 



\ 



B. 



Bj-^s in responses / 
(halo, leniency, popu- 
larity, hostility, etc.) 



A, 



-ERIC 



Cohen & Berger 




Lathrop 


(1968) 


^ Frey 


(1973) 


Lathrop & Richmond 


(1967) 


Bentley 


* (1971) 


McKeachie et al. 


,,,,(1971) 


Nichols & Soper 


(1972) 


Mann > 


(1969) 


Shuh & Crivelli 


(1973) 


Kohl an 


(1973) 


Elliot 


(1950) 


-—McClelland 


(1970) 


Costiri et. al. 


(1971) 


McKeachie and Solomoiv 


(1958) - 


Perry & Baumann 


(1973) 


- Hodin & Rodin 


(1972) 


Whitely & Doyle 


(1973) 


McKeachie 


(1969,1973) 


Colliver / 
Gransi/^S pointer 


(1972) 


(1973) 


. Whitely y Doyle 


(1974) 13 



C. Correlates 

1. Student characteristics 
sex, age, class standing 
* class size, grades, 
basis for judgements,, 
achievement , etc . 



^ * 



TABLE I. - CONT'D 
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AUTHOR (S) 



DATE 



AREA 4 



Mueller & Millf"- 
Mlller 
Remitters 
Russell 

vaeks & French 

Caffrey 

Spencer 

Rubenstein & Mitchell 
Tref finger & Feldhusen 
Walker 

Yonge & Sarrenrath 



(1970) 

(1972) 

(1960) 

U951) 

(1960) 

(1969) 

(1968) 

(1970)' 

(1970) 

(1969^ 

^68) 



Correlates 

1. Student characteristics 
sex , age , etc . . . 



Murray 
Nichols & Sopel: 
Reminers, 
McKeachie 

„arlman 
Gage 

Vi llano et al . 

Miller 

Guthrie 

Eckert & Keller 
Lovell & Haner 
Clark & Keller 



(1973) 
(1972) 
(1959) 
(1973) 
(1973) 
(1961) 
(1974) 
(1972) 
(1954) 
(1954) 
(1955) 
(1954) 



Course characteristics 

2. (type of course, content, 
difficulty, required vs 
elective, level, time of 
class, etc. 1*1 



McKeachie 
Remitters 

Costin et al . zg>* j 
Riley et al. / 
Stallings & Singhal J 
McGrath 

BresslWr * * J C * - 



(1973) 
(1959) 
(1971) 
(1950) 
(196S) 

--•(1968) 



Instructor character- 
istics. 

sex, age, rank, degrees, 
experience^ grading stan- 
dards, knowledge of sub- 
ject, research, knowledge 
of teaching, personality 
traits, popularity and 
change after feedback 
c tr . 
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TABLE I. " CONT'D 



13. 



AUTHOR (S) 



DATE 



AREA 



Costin 
Guthrie 
Hayes 
^oeks 

McDaniel & Feldhusen 

s 

Murray 
Richardson 
Isaacson et al 
Clark & Blackburn 
Sorey 

Sherman & Blackburn 

Miller M. 

Thomas 

Bentley 

CentiJd 4 

Aleamoni 

Hoyt 

Tuchkman + 
Tuckman & Oliver^ 
B^aunstein et al. 
Rayder 



(1968) 

(1954) 

(1971) 

(1962) 

(1970) 

(1973) 

(1973) 

(1963) 

(1973) 

(1968) 

(1974) 

(1971) 

(19.69) 

(1971):. 

(1972,1973) 

(1973) . 

(1973) 

(1973) 

(1968) 

(1973) 

(1968) 



Instructor character- 
istics 

sex, age, rank, etc.. 



EMC 



Costin et al. 
Murray 
Socklof f 
Hayes 

DrucJter & Remmers 
Costin 

Webb & Nolan 
Perry 

Wilson et al. 
Gaff 

Touq et al. 
Centra 



(1971) 
(1973) 
(1973) 
(1971) 
(1951) 
(1966) 
(1955) 
11969) 
(1973) 
(1973) 
(1973) 

(1973) 



D. Student ratings versus 

other raters , alumn% 
colleagues, head of 
departments, obser* rs 
etc. . . 
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TABLE I. - CONT'D 



14. 



AUTHOR (S) 



DATE 



AREA 



Braunstein & Benston 
Guthrie 

Mas low & Zimmerman 



(1973) 

(1949, 1954) 
(1956) 



DJ > Student ratings versus 
* other raters, alumni, 

GtC • • • 
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